Goto

Collaborating Authors

 sign information


Find A Winning Sign: Sign Is All We Need to Win the Lottery

Oh, Junghun, Baik, Sungyong, Lee, Kyoung Mu

arXiv.org Artificial Intelligence

The Lottery Ticket Hypothesis (LTH) posits the existence of a sparse subnetwork (a.k.a. winning ticket) that can generalize comparably to its over-parameterized counterpart when trained from scratch. The common approach to finding a winning ticket is to preserve the original strong generalization through Iterative Pruning (IP) and transfer information useful for achieving the learned generalization by applying the resulting sparse mask to an untrained network. However, existing IP methods still struggle to generalize their observations beyond ad-hoc initialization and small-scale architectures or datasets, or they bypass these challenges by applying their mask to trained weights instead of initialized ones. In this paper, we demonstrate that the parameter sign configuration plays a crucial role in conveying useful information for generalization to any randomly initialized network. Through linear mode connectivity analysis, we observe that a sparse network trained by an existing IP method can retain its basin of attraction if its parameter signs and normalization layer parameters are preserved. To take a step closer to finding a winning ticket, we alleviate the reliance on normalization layer parameters by preventing high error barriers along the linear path between the sparse network trained by our method and its counterpart with initialized normalization layer parameters. Interestingly, across various architectures and datasets, we observe that any randomly initialized network can be optimized to exhibit low error barriers along the linear path to the sparse network trained by our method by inheriting its sparsity and parameter sign information, potentially achieving performance comparable to the original. The code is available at https://github.com/JungHunOh/AWS\_ICLR2025.git


Reviews: Full-Gradient Representation for Neural Network Visualization

Neural Information Processing Systems

Updates based on author feedback: Given that the authors added the digit flipping experiments and obtained good results (albeit with different choices of post-processing) I am increasing my score to a 7. However, ***the increased score is based on good faith that the authors will add the following to their paper***: (1) I think it would be extremely helpful to practitioners if the authors exposed how different choices of post-processing affected the results. What happens to the digit flipping experiments when sign information is discarded? What happens to Remove & Retrain when sign information is retained? Please be as up front as possible about the caveats; practitioners should be made aware that the choice of post-processing is something they need to pay close attention to, or the method may be used in a way that gives misleading results (as mentioned, we have seen this happen before with Guided Backprop).


Compressing Sign Information in DCT-based Image Coding via Deep Sign Retrieval

Suzuki, Kei, Tsutake, Chihiro, Takahashi, Keita, Fujii, Toshiaki

arXiv.org Artificial Intelligence

The discrete cosine transformation (DCT) [1] is known as an important technique for image coding and is adopted in various image coding standards [2, 3, 4, 5, 6, 7, 8, 9]. For instance, JPEG [2] first divides an original image into non-overlapping blocks and then applies DCT to each of the blocks followed by quantization. Entropy coding is finally performed to obtain bit representations for the quantized DCT coefficients. According to the source coding theory [10], statistically biased symbols can be efficiently compressed using entropy coding methods such as [11, 12, 13, 14]. However, the sign information of DCT coefficients has equiprobable characteristics [15, 16, 17], i.e., the probabilities of the positive and negative signs are almost even, and the compression of the sign information has been thus considered impossible. Therefore, each of the signs is represented using 1 bit in typical image coding methods; the sign information consumes many bits in the resulting bitstream. To reduce the bit amount for the signs, we address a sign compression problem for DCT coefficients in this paper. In particular, we consider a lossless sign compression problem, where the signs of the DCT coefficients are decoded without loss. We briefly summarize seminal works developed to tackle this challenging problem.


Synthesis of Gaussian Trees with Correlation Sign Ambiguity: An Information Theoretic Approach

Moharrer, Ali, Wei, Shuangqing, Amariucai, George T., Deng, Jing

arXiv.org Machine Learning

The goal of any inference algorithm is to recover the hidden parameters related to those k hidden nodes (k may be unknown). Consider a special subset of graphical models, known as latent Gaussian trees, in which the underlying structure is a tree and the joint density of the variables is captured by a Gaussian density. The Gaussian graphical models are widely studied in the literature because of a direct correspondence between conditional independence relations occurring in the model with zeros in the inverse of covariance matrix, known as the concentration matrix. There are several works such as [1,2] that have proposed efficient algorithms to infer the latent Gaussian tree parameters. In fact, Choi et al., proposed a new recursive grouping (RG) algorithm along with its improved version, i.e., Chow-Liu RG (CLRG) algorithm to recover a latent Gaussian tree that is both structural and risk consistent [1], hence it recovers the correct value for the latent parameters. They introduced a tree metric as the negative log of the absolute value of pairwise correlations to perform the algorithm. Also, Shiers et al., in [3], characterized the correlation space of latent Gaussian trees and showed the necessary and sufficient conditions under which the correlation space represents a particular latent Gaussian tree. Note that the RG algorithm can be directly related to correlation space of latent Gaussian trees in a sense that it recursively checks certain constraints on correlations to converge to a latent tree with true parameters.